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Abstract 

Motivated by the broadcast view of the interference channel, the new problem of communi- 
cation with disturbance constraints is formulated. The rate-disturbance region is established 
for the single constraint case and the optimal encoding scheme turns out to be the same as 
the Han-Kobayashi scheme for the two user-pair interference channel. This result is extended 
to the Gaussian vector (MIMO) case. For the case of communication with two disturbance 
constraints, inner and outer bounds on the rate-disturbance region for a deterministic model 
are established. The inner bound is achieved by an encoding scheme that involves rate splitting, 
Marton coding, and superposition coding, and is shown to be optimal in several nontrivial cases. 
This encoding scheme can be readily applied to discrete memoryless interference channels and 
motivates a natural extension of the Han-Kobayashi scheme to more than two user pairs. 



I. Introduction 

Alice wishes to communicate a message to Bob while causing the least disturbance to nearby 
Dick, Diane, and Diego, who are not interested in the communication from Alice. Assume a 
discrete memoryless broadcast channel p(y, zi, . . . , Zk\%) between Alice X, Bob Y, and then- 
preoccupied friends Z\ , . . . , Zk as depicted in Figure 1 . We measure the disturbance at side 
receiver Zj by the amount of undesired information rate (l/n)I(X n ; Z") originating from the 
sender X, and require this rate not to exceed j in the limit. The problem is to determine the 
optimal trade-off between the message communication rate R and the disturbance rates Rdj- 
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Figure 1. Communication system with disturbance constraints. 



This communication with disturbance constraints problem is motivated by the broadcast side 
of the interference channel in which each sender wishes to communicate a message only to 
one of the receivers while causing the least disturbance to the other receivers. However, in this 
paper, which is an extended version of [1], we focus on studying the problem of communication 
with disturbance constraints itself. The application of the coding scheme developed in this paper 
to deterministic interference channels with more than two user pairs is discussed in [2]. 
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For a single disturbance constraint, we show that the optimal encoding scheme is rate splitting 
and superposition coding, which is the same as the Han-Kobayashi scheme for the two user-pair 
interference channel [3, 4]. This motivates us to study communication with more than one 
disturbance constraint with the hope of finding good coding schemes for interference channels 
with more than two user pairs. To this end, we establish inner and outer bounds on the rate- 
disturbance region for the deterministic channel model with two disturbance constraints that are 
tight in some nontrivial special cases. In the following section we provide needed definitions and 
present an extended summary of our results. The proofs are presented in subsequent sections, 
with some parts deferred to the Appendix. 

II. Definitions and main results 

Consider the discrete memoryless communication system with K disturbance constraints 
(henceforth referred to as DMC-if -DC) depicted in Figure 1 . The channel consists of K + 2 
finite alphabets X, y, Zj, j e and a collection of conditional pmfs p(y, z\, . . . , Zk\x). 

A (2 nR ,n) code for the DMC-if-DC consists of the message set [l:2 n - R ], an encoding function 
x n : [l:2 nR ] — »■ X n , and a decoding function rh : y n — > [l:2 nR ]. We assume that the message 
M is uniformly distributed over [1:2"" R ]. A rate-disturbance tuple (R, i? d; i, . . . , R<i,k) € 
is achievable for the DMC-if-DC if there exists a sequence of (2 nR ,n) codes such that 

lim P(M ^ M) = 0, 

n—^-oo 

limsup (l/n)I(X n - Zj 1 ) < Raj, j e [1:K]. 

n— >oo 

The rate-disturbance region Si of the DMC-K -DC is the closure of the set of all achievable 
tuples (R, R d ,i, Ra,x)- 

Remark 1. Like the message rate R, the disturbance rates Raj, for j e [i-'-K], are measured in 
units of bits per channel use. (We use logarithms of base 2 throughout.) 

Remark 2. The measure of disturbance (l/n)I(X n ; Zj 1 ) can be expanded as (l/n)_ff(Zj l ) — 
(l/n)H(Z™ | X n ). The first term is the entropy rate of the received signal Zj and is caused by 
both the transmission itself and by noise inherent to the channel. Subtracting the second term 
separates out the noise part. (For channels with additive white noise, e.g., the Gaussian case, 
the second term is exactly the differential entropy of each noise sample.) 

Remark 3. Our results remain essentially true if disturbance is measured by (l/n)i/(Z™) instead. 
If the channel is deterministic, the two measures coincide. 

Remark 4. The disturbance constraint (l/n)I(X n ; Zj 1 ) < R&j is reminiscent of the information 
leakage rate constraint for the wiretap channel [5, 6], that is, (l/n)/(M; Zj 1 ) < i? leak . Replacing 
M with X n , however, dramatically changes the problem and the optimal coding scheme. In the 
wiretap channel, the key component of the optimal encoding scheme is randomized encoding, 
which helps control the leakage rate (l/n)J(M; Zj 1 ). Such randomization reduces the achievable 
transmission rate for a given disturbance constraint, hence is not desirable in our setting. 

The rate-disturbance region is not known in general. In this paper we establish the following 
results. 

A. Rate-disturbance region for a single disturbance constraint 

Consider the case with a single disturbance constraint, i.e., K = 1, and relabel Z\ as Z and 
i?d,i as i?d- We fully characterize the rate-disturbance region for this case. 
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Theorem 1. The rate-disturbance region M of the DMC-l-DC is the set of rate pairs (R, R d ) 
such that 

R < I(X;Y), 
R d > I(X;Z\U), 
R-Rd< I(X: Y\U)- I(X; Z \ U), 

for some pmf p(u,x) with \U\ < \X\ + 1. 

Let &(U, X) be the rate region defined by the rate constraints in the theorem for a fixed 
joint pmf (U,X) ~ p(u, x). This rate region is illustrated in Figure 2. The rate-disturbance 
region is simply the union of these regions over all p(u, x) and is convex without the need for 
a time-sharing random variable. 




1 1 \-+R 

I(X;Y\U) I(X;Y) 

Figure 2. Example of M{U, X), the constituent region of Si. 

The proof of Theorem 1 is given in Subsections III- A and III-B. Achievability is established 
using rate splitting and superposition coding. Receiver Y decodes the satellite codeword while 
receiver Z distinguishes only the cloud center. Note that this encoding scheme is identical to 
the Han-Kobayashi scheme for the two user-pair interference channel [3, 4]. 

We now consider three interesting special cases. 

1) Deterministic channel: Assume that Y and Z are deterministic functions of X. We show 
that the rate-disturbance region in Theorem 1 reduces to the following. 

Corollary 1. The rate-disturbance region for the deterministic channel with one disturbance 
constraint is the set of rate pairs (R, R d ) such that 

R<H(Y), 
R-R d < H(Y\Z), 

for some pmf p(x). 

Clearly, this region is convex. Alternatively, the region can be written as the set of rate pairs 
(R, i?d) such that 

R<H(Y\Q), 
R d >I(Y;Z\Q), 

for some joint pmf p(q, x) with \Q\ < 2. Corollary 1 and the alternative description of the 
region are established by substituting U = Z in the region of Theorem 1 and simplifying the 
resulting region as detailed in Subsection III-C. 

Remark 5. Consider the injective deterministic interference channel with two user pairs depicted 
in Figure 3. Here, is a function that models the link from transmitter i to receiver j, for 
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i,j € {1,2}. The combining functions fj are assumed to be injective in each argument. This 
setting is a special case of the channel investigated in [7]. This can be seen by merging gn 
and fi of Figure 3 into a function f[ that maps (Xi, Z-i) to Y\. Likewise, define the function 
f 2 as the merger of 522 and f 2 . The modified combining functions f{ and f 2 are injective 
in Z 2 and Z\, respectively, and therefore satisfy the assumptions in [7]. It follows that the 
Han-Kobayashi scheme where the transmitters use superposition codebooks generated according 
to p(zi)p(xi\zi) and p{z2)p{x2\z2) achieves the capacity region of the channel in Figure 3. 

On the other hand, Corollary 1 shows that the same encoding scheme achieves the disturbance- 
constrained capacity for the channels Xi —> (Y[,Zi) and X 2 (Y 2 ,Z 2 ), shown as dashed 
boxes in Figure 3. Here, Y[ and Y 2 are the desired receivers, and Z\ and Z 2 are the side 
receivers associated with disturbance constraints. Note that decodability of the desired messages 
at receivers Y\ and Y 2 in the interference channel certainly implies decodability at Y[ and Y 2 
in the channels with disturbance constraint, respectively. 



Mi Xi 



M, 



Xo 




Yj Mi 



Y 2 -> M 2 



Figure 3. Injective deterministic interference channel with two user pairs. 



Example 1. Consider the deterministic channel depicted in Figure 4(a) and its rate-disturbance 
region in Figure 4(b). Note that rates R < 1 can be achieved with zero disturbance rate by 
restricting the transmission to input symbols {0, 1} (or {2, 3}), which map to different symbols 
at Y, but are indistinguishable at Z. On the other hand, for sufficiently large the disturbance 
constraint becomes inactive and R is bounded only by the unconstrained capacity log(3). In 
addition to the optimal region achieved by superposition coding, the figure also shows the 
strictly suboptimal region achieved by simple non-layered random codes. 

2 ) Gaussian channel: Consider the problem of communication with one disturbance constraint 
for the Gaussian channel 

Y = X + Wi, 

z = x + w 2 , 

where the noise is W\ ~ jV(0, 1) and W2 ~ A/"(0, N). Assume an average power constraint P 
on the transmitted signal X. 

The case N < 1 is not interesting, since then Y is a degraded version of Z and the disturbance 
rate is simply given by the data rate R. If JV > 1, Z is a degraded version of Y, and the 
rate-disturbance region reduces to the following. 

Corollary 2. The rate-disturbance region of the Gaussian channel with parameters P > and 
N > 1 is the set of rate pairs (R, R4) such that 

R < C(aP), 
R d > C(aP/N), 

for some a e [0, 1], where C(x) = (1/2) log(l + x) for x > 0. 

Achievability is proved using Gaussian codes with power aP. The converse follows by 
defining a* £ [0, 1] such that R — C(a*P) and applying the vector entropy power inequality 
to Z n — Y n + W 2 , where W 2 ~ jV(0, N — 1) is the excess noise. The details are given in 
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(b) Rate-disturbance region 
Figure 4. Deterministic example with one disturbance constraint. 



Subsection III-D. Note that this is a degenerate form of the Han-Kobayashi scheme because the 
constraint from the multiple access side of the interference channel is not taken into consideration. 

3) Vector Gaussian channel: Now consider the vector Gaussian channel with one disturbance 
constraint 

Y = X + Wi, 

z = x + w 2 , 

where X £ W 1 and the noise W\ ~ Af(0, K{) and W 2 ~ Af(0,K 2 ) for some positive 
semi definite covariance matrices K±, K 2 G M nx ™. Assume an average transmit power constraint 
tr(K x ) < P, where K x — E(XX T ) is the covariance matrix of X. This case is not degraded 
in general. 

Theorem 2. The rate-disturbance region of the Gaussian vector channel with parameters P, 
K\, and K 2 is the convex hull of the set of pairs (R 1 i?d) such that 

\K U + K V + K 1 \ 



R<\\og 
R-Ri<\ log 
Rd > \ log 



l*il 
K v +K x \ \K 2 \ 
K v +K 2 \ 
K v +K 2 \ 



L 2| 

for some positive semidefinite matrices K U ,K V £ E nx ™ with tr(K u + K v ) < P. 

Achievability of this rate-disturbance region is shown by applying Theorem 1. Using the 
discretization procedure in [8], it can be shown that the theorem continues to hold with the 
power constraint additionally applied to the set of permissible input distributions. The claimed 
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region then follows by considering the special case where the input distribution p(u, x) is jointly 
Gaussian. To prove the converse, we use an extremal inequality in [9] to show that Gaussian 
input distributions are sufficient. The details of the proof are given in Subsection III-E. 



B. Inner and outer bounds for the deterministic channel with two disturbance constraints 

The correspondence between optimal encoding for the channel with one disturbance constraint 
and the Han-Kobayashi scheme for the interference channel suggests that the optimal coding 
scheme for K disturbance constraints may provide an efficient (if not optimal) scheme for the 
interference channel with more than two user pairs. This is particularly the case for extensions 
of the two user-pair injective deterministic interference channel for which Han-Kobayashi is 
optimal [7] (see Remark 5). As such, we restrict our attention to the deterministic version of 
the DMC-2-DC. 

First, we establish the following inner bound on the rate-disturbance region. 

Theorem 3 (Inner bound). The rate-disturbance region M of the deterministic channel with 
two disturbance constraints is inner-bounded by the set of rate triples (R, i?d,i, Rd,2) sucri that 



R 
2R 

for some pmf p(u, x). 



R<H(Y), 
R lS ,i + R 4< 2>I(Z 1 ;Z 2 \U), 
R-Rds < H(Y\Z U U), 
R-Rd,2 < H(Y\Z 2 ,U), 
R 4 ,i-Rd < 2<H(Y\Z 1 ,Z 2 ,U)- 
R i ,i-R <i ,2<H{Y\Z 1 ,Z 2 ,U)-\ 
-I(Z 1 ;Z 2 



I(Z 1 ;Z 2 \U), 
H(Y | U) 
U), 



(1) 

(2) 
(3) 
(4) 
(5) 

(6) 



The inner bound is convex. The expression 1(Z\ \ Z 2 \ U) appears in three of the inequalities. 
As in Marton coding for the 2-receiver broadcast channel with a common message, it is 
the penalty incurred in encoding independent messages via correlated sequences. The region 
&(U, X) defined by the inequalities in the theorm for a fixed p(u, x) is illustrated in Figure 5. 



Rd,2 




Figure 5. Region 3?,{U, X) for Theorem 3. Each face is annotated by the inequality that defines it. 



Remark 6. The right-hand side of condition (6) can be equivalently expressed as 

H(Y | Z U Z 2 , U) + H(Y | U) - I{Z X ; Z 2 \ U) 

= H(Y | Zi, U) + H(Y | Z 2 , U) - I{Z X -Z 2 \ U, Y), 

This shows that the condition is stricter than the sum of conditions (3) and (4). 

The encoding scheme for Theorem 3 involves rate splitting, Marton coding, and superposition 
coding. The analysis of the probability of error, however, is complicated by the fact that receiver 



7 



Y wishes to decode all parts of the message as detailed in Subsection IV-A. Receivers Z\ and 
Z 2 each observe a satellite codeword from a superposition codebook. 

Remark 7. The encoding scheme underlying the inner bound of Theorem 3 can be readily 
extended to the general (non-deterministic) DMC-2-DC. 

To complement the inner bound, we establish the following outer bound on the rate-disturbance 
region of the deterministic channel with two disturbance constraints. 

Theorem 4 (Outer bound). If a rate triple (R, Rd,i, Rd,2) is achievable for the deterministic 
channel with two disturbance constraints, then it must satisfy the conditions 



for some pmf p(q, x) with \Q\ < 3. 

The proof of this outer bound is given in Subsection IV-B. Note that this outer bound is very 
similar in form to the alternative description of Corollary 1 for the single-constraint deterministic 
case. 

The inner bound in Theorem 3 and the outer bound in Theorem 4 coincide in some special 
cases. To discuss these, we introduce the following notation. Since all channel outputs are 
functions of X, they can be equivalently thought of as set partitions of the input alphabet X. Set 
partitions form a partially ordered set (poset) under the refinement relation. Since this poset is a 
complete lattice [10], the following concepts are well-defined. For two set partitions (functions) 
/ and g, let / =<; g denote that / is a refinement of g (equivalently, g is degraded with respect 
to /), let / A g be the intersection of the two set partitions (the function that returns both / 
and g), and let / V g denote the finest set partition of which both / and g are refinements (the 
Gacs-Korner-Witsenhausen common part of / and g, cf. [11, 12]). 

The inner bound of Theorem 3 coincides with the outer bound of Theorem 4 if Z\ or Z 2 is 
a degraded version of Y A {Z\ V Z 2 ), i.e., if the output Y together with the common part of Zi 
and Z 2 determine Z\ or Z 2 completely. 

Theorem 5. The rate-disturbance region 8% of the deterministic channel with two disturbance 
constraints is given by the outer bound of Theorem 4 if 



The theorem is proved by specializing Theorem 3 as detailed in Subsection IV-C. In the 
case where Z\ or Z 2 is a degraded version of Y alone, achievability follows by setting U = 
in Theorem 3. Otherwise, we let U = Z\\l Z 2 . This is intuitive, since U corresponds to the 
common-message step in the Marton encoding scheme. 

Example 2. Consider the deterministic channel depicted in Figure 6. The desired receiver output 
Y is a refinement of both side receiver outputs Z\ and Z 2 , and hence, Theorem 5 applies. 
Figure 7(a) depicts the rate-disturbance region, numerically approximated by evaluating each 
grid point in a regular grid over the distributions p(x) and subsequently taking the convex 
hull. Figure 7(b) contrasts the single-constraint case (where i?d,2 is set to infinity, and thus 
inactive) with the case where both side receivers are under the same disturbance rate constraint 
(^?d,i = Rd,2)- As expected, imposing an additional disturbance constraint can significantly 
reduce the achievable message rate. Finally, Figure 7(c) illustrates the trade-off between the 
disturbance rates i?d,i and Rd, 2 at the two side receivers, for a fixed data rate R. 

We conclude this section by considering another case in which we can fully characterize the 
rate-disturbance region of the deterministic channel with two disturbance constraints. If Z\ is a 



R<H(Y\Q), 
Rd,i > I(Y; Z Y | C 
Rd,2 > I(Y; Z 2 | C 



Q) 

Q) 



Y A(Zi V Z 2 ) =<; Z u or 

Y A(Zi V Z 2 ) =<; Z 2 . 
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M -> X 



{0,1,2,3} 



0-0 
3 — 2 



{0,1,2} 



{0,1,2} 



^1 



• y -> m 



Figure 6. Deterministic channel with two disturbance constraints (Example 2). 



degraded version of Z 2 (or vice versa), the region M of Theorem 3 is optimal and simplifies to 
the following. 

Corollary 3. The rate-disturbance region ffl of the deterministic channel with two disturbance 
constraints with Z\ =<! Z 2 or Z 2 Z\ is the set of rate triples (R, R& t i, Rd.2) sucri that 

R < H(Y), 
R-Ri,i<H{Y\Z{), 
R-Rd,2<H(Y\Z 2 ). 

for some pmf p(x). 

Achievability follows as a special case of Theorem 3. The encoding scheme underlying the 
theorem carefully avoids introducing an ordering between the side receiver signals Z\ and Z%, 
but such ordering is naturally given by the channel here. Consequently, the corollary follows by 
setting the auxiliary U equal to the output at the degraded side receiver. This turns the encoding 
scheme into superposition coding with three layers. The details are given in Subsection IV-D. 

Note that the region of Corollary 3 is akin to the deterministic case with one disturbance 
constraint in Corollary 1 . In both cases, the side receiver signals need not be degraded with 
respect to Y. 
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(a) Rate-disturbance region 




R 



(b) Single disturbance constraint (i?d,i = Rd, Rd,2 = oo) and symmetric 
disturbance constraint (Rd.i — Rd,2 = Rd)- 



R = 2.0 




0.5 l.O 1.5 i? dil 
(c) Contour lines of the rate-disturbance region at constant rate R. 

Figure 7. Rate-disturbance region for Example 2. 
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III. Proofs for a single disturbance constraint 

A. Achievability proof of Theorem 1 
Achievability is proved as follows. 

Codebook generation. Fix a pmf p(u, x). 

1) Split the message M into two independent messages M and Mi with rates R and Ri, 
respectively. Hence R = R + Ri . 

2) For each m e [1 : 2 nRo ], independently generate a sequence u n (m ) according to 

Ui=iP( u i)- 

3) For each (mo, mi) <G [l:2 nflo ] x [1:2"- Rl ], independently generate a sequence x n (m , mi) 
according to T[" =1 p(x l | Ui(m )). 

Encoding. To send message m = (mo, mi), transmit x n (m ,mi). 

Decoding. Upon receiving y n , find the unique (m ,mi) such that (u n (rho), x n (rho, mi), y n ) G 
% {n) (U,X,Y). 

Analysis of the probability of error. We are using a superposition code over the channel from X 
to Y. Using the law of large numbers and the packing lemma in [8], it can be shown that the 
probability of error tends to zero as n — » oo if 

Ri < I(X;Y\U) - 6(e), (7) 
R + R 1 <I(X;Y)-5(e). (8) 

Analysis of disturbance rate. We analyze the disturbance rate averaged over codebooks C. 

I(X n ; Z n \C)< H(Z n , M \ C) - H(Z n \ X 7 \C) 

= H(M Q ) + H(Z n | M ,C) - H(Z n | X n ) 

< nR Q + H(Z n | U n ) - nH(Z \ X) 

< nR a + nH(Z \ U) - nH(Z \ X, U) 
= nR +nI(X;Z\U) 

< nR A , (9) 

where (a) follows since U n is a function of the codebook C and M . Substituting R = R + Ri 
and using Fourier-Motzkin elimination on inequalities (7), (8), and (9) completes the proof of 
achievability. 

B. Converse of Theorem 1 

Consider a sequence of codes with Pi™^ — > as n — > oo and the joint pmf that it induces 
on (M, X n , Y n , Z n ) assuming M ~ Unif [1 : 2 nR ). Define the time-sharing random variable 
Q ~ Unif [l:n], independent of everything else. We use the identification U = (Q, Yq + \, Z^^ 1 ), 
and let X = Xq, Y ~Yq, and Z = Zq. Note that (A", Y, Z) is consistent with the channel. 
Then 



R < I(X;Y)+e 
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as in the converse proof for point-to-point channel capacity, which uses the same identifications 
of random variables. On the other hand, 

nR d > I(X n ;Z n ) 

= H{Z n ) - H{Z n \X n ) 



Finally, 



= J2{H(Z l \Z*- 1 )-H(Z l \X i )) 



»=i 



>Y J H(Z l \r-\Y? +1 )-nH{Z\X) 

i=l 

= nH{Z | U) - nH(Z \ X, U) 
= nI(X;Z\U). 



n(R d - R) 

> I(X n ;Z n ) - nR 

> H{Z n ) - H(Z n | X n ) - I(M; Y n ) - ne n 



® ]T (H(Z t | Z*" 1 ) - I(M; y | Y? +1 )) - nH(Z \ X) - ne n 



(b) 

i=l 



= £ (H(Zi | Z*-\Y? +1 ) + I{Y? +1 ;Zi \ Z*" 1 ) 

i=l 

-H(Yi | Y? +1 ) + Hfr | M, Y? +1 )) - nH(Z \ X) - ne n 

n 

® ]T (H(Zi | Z l -\Y? +l ) + I(Y t ; Z*" 1 | Y? +1 ) 

i=l 

-tf (y | y^i) + # (*i I *0) - | X) - ne n 

n 

= £ | ^-MTJ-i) - H{Yi I 

1=1 

+H(Yi | X i; Z*- 1 , F^i)) - nH(Z \ X) - ne n 

n 

= (H(Zi i r-\Yr +1 ) - jpq; y i r-\Yr +1 )) 



8 = 1 

- nif (Z | X) - ne„ 

( =' nff (Z | [/) - nI(X; Y \U) — nH(Z \X,U)- ne n 
= nI(X; Z\U)- I(X; Y \U) — ns n , 

where (a) uses Fano's inequality, (b) single-letterizes the noise term H(Z n \ X n ) with equality 
due to memorylessness of the channel, (c) applies Csiszar's sum identity to the second term and 
channel memorylessness to the fourth term, and (d) uses the previous definitions of auxiliary 
random variables. Finally, the cardinality bound on U is established using the convex cover 
method in [8]. 
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C. Proof of Corollary 1 

Using the deterministic nature of the channel, the region in Theorem 1 reduces to the set of 
rate pairs (R, R d ) such that 

R<H{Y), (10) 
R d >H(Z\U), (11) 
R d >R + H(Z\U)-H(Y\U), (12) 

for some pmf p(u, x). Now fixing a rate R and a pmf p(x) and varying p(u\x) to minimize 
i?d, the right hand sides of (11) and (12) are lower bounded by 

H(Z\U) > 0, 

and 

R + H(Z | U) - H(Y | U) 
= R + H(Z | U) - H(Y, Z\U) + H{Z \ Y, U) 
= R - H(Y \Z,U) + H(Z | Y, U) 
>R-H(Y\Z). 

Note that the particular choice U = Z simultaneously achieves both lower bounds with equality 
and is therefore sufficient. The rate-disturbance region thus reduces to Corollary 1. 

For a fixed pmf p(x), this region has exactly two corner points: Pi = (H(Y\Z),0) and 
P2 = (H(Y), I(Y; Z)). As we vary p(x), there is one corner point Pi that dominates all other 
Pi points. The pmf p(x) for this dominant Pi can be constructed by maximizing H(Y\Z) 
as follows. For each z <G Z, define y z C y to be the set of y symbols that are compatible 
with z. Let z* be a symbol that maximizes \y z \. For each element of y z *, pick exactly one 
x that is compatible with it and z*. Finally, place equal probability mass on each of these 
x values, and zero mass on all others. This pmf on X yields the dominant corner point Pi, 
namely (log(|34* |), 0). Moreover, for this distribution, P2 coincides with Pi. Therefore, the 
net contribution (modulo convexification) of each pmf p(x) to the rate-disturbance region 
amounts to its corner point P 2 . This implies the alternative description of the region. Lastly, the 
cardinality bound on Q in the alternative description is follows from the convex cover method 
in [8]. 

D. Proof of Corollary 2 

Achievability is straightforward using a random Gaussian codebook with power control, and 
upper-bounding the disturbance rate at receiver Z by white Gaussian noise. The converse can 
be seen as follows. Clearly, R < C(P). Let a* e [0, 1] be such that R = C(a*P). Then 

n C(a*P) =nR< I{X n ; Y n ) + ne n 

= h(Y n )-h(Y n \X n )+ne n , 

and therefore, 

h(Y n ) > f log(27re) + nC(a*P) - ne n 
= f log (27re(l + aP)) - ne n 

Since N < 1, we can write the physically degraded form of the channel as Y = X + W\, 
Z = Y + W2, where W2 ~ Af(Q,N — 1) is the excess noise that receiver Z experiences in 
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addition to receiver Y. Applying the vector entropy power inequality to Z n = Y n + W 2 , we 
conclude 

\h{Z n ) > ilog(2^ /l ( rn )+2^(^)) 

> \ log (2~ 2e " ■ 27re(l + a* P) + 2ne{N - 1)) 

> ilog(27re(^ + a*P))-e„, 

and finally, 

R&> ll{X n ;Z n ) 

= \h{Z n ) - |log(27reJV) 
>C(a*P/N)-e n . 

E. Proof of Theorem 2 

Recall the shape of &{U, X) depicted in Figure 2. The coordinates of the corner points A 
and B are given by 

A(U,X): R = h(X + W 1 )-h(W 1 ), (13) 
R d = h(X + W 2 \U) + h(X + Wx) - h(X + W 1 \U)- h(W 2 ), (14) 

B(U,X): R = h(X + W 1 \U)-h(W 1 ), (15) 
R d = h(X + W 2 \U)-h(W 2 ). (16) 

Proof of achiev ability: We specialize Theorem 1. Consider the specific p(u,x) constructed 
as follows. For given positive semidefinite matrices K U ,K V € R nxn with tr(K u + K v ) < P, 
let 

U~M(Q,K n ), 
V~Af{0,K v ), 

x = u + v, 



where U and V are independent. Then, the terms in Theorem 1 evaluate to 

I{X-Y)=h{Y)-h{W 1 ) = \\og^- 



\K,, + K v +K l \ 



\Ki\ 
\K V + KA 



I{X-Y\U) = h(Y\U)-h{W 1 ) = \\og K 
I(X; Z\U) = h(Z\U)- h{W 2 ) = \ log 



K v + K 2 \ 

Simplifying the right hand sides and introducing time-sharing leads to the desired result. 
For completeness, the coordinates of A and B for given matrices K u , K v are 

\K U + K V +K X \ 
\Ki\ 

\K V + K 2 \ \K U + K v + K l \ 



A(K U1 K V ) : 


R = 


\ log 




Rd = 


\ ^g 


B{K U ,K V ) : 


R = 


\ log 




Rd = 


\ log 



\K 2 \ \K V +K X \ 
\K V + K 1 \ 



\Ki\ 
\K V + K 2 \ 



The constituent region &(U,X) for fixed K u and K v is depicted in Figure 8. 



(17) 
(18) 
(19) 
(20) 
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Figure 8. Constituent region for Gaussian superposition codebook with parameters K u and K v . 



Proof of converse: The converse proof of Theorem 1 continues to hold and we only 
need to show that Gaussian input distributions are sufficient. We proceed as follows. Since 
the rate-disturbance region is convex, its boundary can be fully characterized by maximizing 
R - XRd for each A > 0. We write 

R - XR d < max {R - XRA 

= max max {R — XRA , 

(U,X) {R,Ri)eM{U,X) 

where the outer optimization is over the joint distribution of (U, X) and the inner optimization 
is over the region achieved by that distribution. The inner optimization can be solved explicitly 
as follows. For ease of presentation, assume for the moment that the power constraint is of the 
form K x -< S for some positive semidefinite matrix S. (That is, valid K x are precisely those 
that result in the matrix S — K x being positive semidefinite.) 

First, consider A < 1. For any distribution (U, X) ~ p(u,x), point A(U, X) achieves a value 
of the inner optimization at least as large as point B(U,X), or any point on the line between 
them. Using the coordinates of A(U, X) in (13) and (14), we can write 

R-XR d < max {A (h(X + Wi I U) - h(X + W 2 I U)) 

(U,X) 

+ (1 - X)h{X + Wi) - /i(Wi) + Xh{W 2 )} 
< X ■ max {h(X + Wi\U)- h(X + W 2 I U)} 

(U.X) 

+ (1 - A) • max {h(X + W t )} - h{Wi) + Xh{W 2 ) 

(u,x) 



< X ■ max < ^ log 



1 + (1 - A) • max { I log ((2*e) n \K x + K,\)} 



k^s\ 2 & \K X +K 2 
-ilog((2^e)"|K 1 |) + |log((2 7 re)"|if 2 |). 

In (a), the two maximizations are taken independently. In step (b), the first maximization is 
achieved by a Gaussian X that is independent of U, due to a theorem proved by Liu and 
Viswanath [9, Thm. 8]. The optimization is now only over covariances matrices. Let K* be an 
optimizer of this first maximization. The second maximization is also achieved by a Gaussian 
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X, and is optimized by K x — S since f(K x ) = \K X + K x \ is matrix monotone. It follows that 



R-\Ra<% log + + ±+ log ((27re)"|S + K x \ 



l ° 8 \K* + K 2 \ + 2 
-ilog((27r e r|^ 1 |) + flog((2 7 r e )"|^ 2 |) 



|Jfi| 2 °|#* + #i| \K 2 \ 

But this upper bound is achieved with equality by Gaussian superposition codebooks, namely 
through the point A{K Ul K v ) as specified by equations (17) and (18), with K u = S — K* and 

K v = K*. 

Now, consider A > 1. The argument proceeds analogously to the previous case. For 
completeness' sake, the details are as follows. We can write the inner optimization explicitly 
using the coordinates of B(U,X) in (15) and (16) as 

R-\R d < max {h(X + W x \ U) - Xh(X + W 2 | U)} + Xh(W 2 ) - h{Wx) 

(U,X) 

< max {|log((27re) n |Jir x + Jifi|)- ±log((2ne) n \K x + K 2 \)} 
K x ^ s 

+ I log ((27re) n |tf 2 |) - \ log ((2 7 re)"|^ 1 |) . 

The optimum in (a) is achieved by a Gaussian X (independent of U) by virtue of [9, Thm. 8], 
while the other two terms are independent of the optimization variable. Let K* be an optimizer. 
Then 

i?-Ai? d <,log ^ -- 2 \og ^ . 

This upper bound is achieved with equality by Gaussian superposition codebooks through the 
point B(K U , K v ) as given by equations (19) and (20) with K u = and K v = K*. This is a 
power control strategy, similar to the scalar Gaussian case. 

We have thus shown that under a power constraint K x -< S, Gaussian superposition codes 
are optimal. The conclusion extends to the sum power constraint tr(K x ) < P by observing that 

{K x : tv(K x ) < P} - |J {K x : K x < S}. 

S: S^O 
tr(S)<P 

In other words, the sum power constraint can be expressed as a union of constraints of the type 
Kx S, for each of which Gaussian superposition codes are optimal. Therefore, a Gaussian 
superposition code must be optimal overall, too. ■ 

IV. Proofs for two disturbance constraints 

A. Proof of Theorem 3 

Codebook generation. Fix a pmf p(u, x). Split the rate as R = R n + Ri + R 2 + i? 3 . Define the 
auxiliary rates Ri > Ri and R 2 > R 2 , let e' > 0, and define the set partitions 

[l:2 nAl ] =£ 1 (l)U---U£ 1 (2" jRl ), 
[1:2"^ 2 ] =£ 2 (1)U---U£ 2 (2" K2 ), 

where Ci(-) and C 2 (-) are indexed sets of size 2"( fll ~- Rl ) and 2 n ( R2 ~ R2 \ respectively. 

1) For each mo G [l:2 raK °], generate u"(mo) according to n"=iP( u i)- 

2) For each li e [l:2 niil ], generate z™(m , h) according to n"=iP( z ii I u i{ m o))- Likewise, 
for each l 2 e [l:2™ fl2 ], generate z 2 (mo,h) according to Il™=iP( z 2i | Ui(mo)). 
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3) For each (mo, mi, r/12), let 5(m , ra ll m 2 ) be the set of all pairs (Zi, Z 2 ) from the product 
set A (mi) x £ 2 (w2) such that (zf (m ,Zi), z 2 («W 2 )) € 7^ ( " ) (^i, ^2 | m™ (too)). 

4) For each (too,Zi,Z2) and m 3 G [l:2 nR3 ], generate x n (mo, h, h, m 3 ) according to 

n 

Y\p{xi I Ui(m ), zu(h), z 2 i{h)) 

»=1 

if (h,l 2 ) G 5(mo,mi,m 2 ). Otherwise, we draw from Uni^A"™). 

5) Choose jjK^i^^jK'™!*)! uniformly from 5(mo,mi,m 2 ). If 5(mo,mi,m 2 ) is 
empty, choose (1,1). 

Encoding. To send message m = (mo, mi, m2, m^), transmit the sequence 

^ n (™ ;("io,mi,m 2 ) ,(m ,TOi,m 2 ) \ 

x (mo,<i ,t 2 , m 3 j. 

Decoding. Let s > e' . Upon receiving j/ n , define the tuple 

T(mo,mi,m 2 ,m 3 ) 

( U "(mo),zr(mo,Z^' roi ' m2) ),z 2 "(m ,4 mo ' roi ' m2) ), 
/(mo, Z^ mo ' mi ' ro2) , 4 ro °' mi ' ro2) . ^3),2/") 
Declare that m = (mo, mi, m 2 , m 3 ) has been sent if it is the unique message such that 
T(m , mi, ma, m 3 ) G T E (n) (U, Z U Z 2 ,X, Y). 

Analysis of the probability of error. Without loss of generality, assume that m = mi = m 2 = 
m 3 = 1 is transmitted. Define the following events. 

£ e i : 5(1, 1, 1) is empty, 
£ e2 : 5(1, 1, 1) contains two distinct pairs with 
equal first or second component, 

£, : {T(m„, mi ,m 2l m 3 )er E W (t/,2i,Z 2 ,I,y) for 

some (mo, mi,m 2 ,m 3 ) <G Mi}, i G {0, . . . , 5}, 

where the message subsets .Mi are specified in Table 1. Defining the "encoding error" event 
£ e = £ e i U £ e 2 and the "decoding error" event = U f 1 U £ 2 U S3 U £4 U £5, the probability 
of error can be upper-bounded as 

P(£)<P(£ e U£ d )<P(£ e ) + P(£ d |£ e c ). 

The motivation for introducing £ e2 as an "error" is to simplify the analysis of the second 
probability term. 

We bound P(£ e ) by the following lemma. Let n = Ri — R\ and r 2 = R 2 - R 2 - 
Lemma 1. P(£ e ) -> as n -> 00 if 

n+r 2 >/(Zi;Zi I [/) + %'), (21) 

n/2 + r 2 </(Zi;Z 2 I [/)-%'), (22) 

n+r 2 /2<7(Z i; Z 2 I [/)-%')■ (23) 

Proof sketch: First, consider £ el . As in the proof of Marton's inner bound for the broadcast 
channel, the mutual covering lemma [8] implies P(£ e i) — > as n — > 00 if (21) holds. 

Now consider £ e 2, for which we need to control the number of typical pairs that can occur 
in the same "row" or "column" of the product set £i(mi) x £ 2 (m 2 ), i.e., for the same l\ or l 2 
coordinate. The probability P(£ e 2) tends to zero provided that (22) and (23) hold. 
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Message subset 


m 


mi 


m,2 


m 3 


Mo 


1 


1 


1 


1 


Mi 


1 


1 


1 


/I 


M 2 


1 


/I 


1 


any 


Ms 


1 


1 


/I 


any 


Mi 


1 


/I 


¥= i 


any 


Ms 


/I 


any 


any 


any 



Table 1. Message subsets for decoding error events. 



This is akin to the birthday problem [13], where k samples are drawn uniformly and 
independently from [l:iV], and the interest is in samples that have the same value (collisions). 
It is well-known that for the probability of collision to be p c , the number of samples required 
is roughly k ss y/—2N ln(l — p c ), which scales with \/N. In our case, the number of samples 
is the cardinality of the set S(mo, mi, 777,2), which is roughly k = 2"( ri+r2 ~ 7 ( Zi;Z2 1 u *>\ The 
samples are categorized into N± — 2 nTl and N 2 = 2 nr2 classes along rows and columns, 
respectively. To achieve a probability of collision p c — > along both dimensions, we need 
k <C mm{ynSfi, \fN-2], which yields exactly the conditions (22) and (23). 

A rigorous proof is given in Appendix A. ■ 
Before we proceed to bound the probability of decoding error, we need the following lemma, 
which is proved in Appendix B. 

Lemma 2 (Independence lemma). Consider a finite set A and a subset A' C A. Let pa be an 
arbitrary pmf over A. Let the random vector A n be distributed proportionally to the product 
distribution ~\Xi=iPa{(Li), restricted to the support set {a n : cik E A' for some k}. Let / be 
drawn uniformly from {i : Ai € A'}. Let J = ((I + s — 1) mod n) + 1 for some integer 
s£ [l:(n — 1)]. Then, the random variables Ai and Aj are independent. 

We bound the probability P(£d | ££) by the following lemma. 

Lemma 3. P(£d | £ e c ) -> as n -> 00 if 

R 3 <H(Y\Z 1 ,Z 2 ,U)-S(s), (24) 

Ri + R 3 < H(Y I Z 2 , U) + J(Z i; Z 2 \U)- 5(e), (25) 

R 2 + R 3 < H(Y I Z u U) + /(Zi; Z 2 | <7) - 5(e), (26) 

£1 + i? 2 + i? 3 < I U) + J(Zi;Z 2 I U) - 5{e), (27) 

i?o + Ri + R2 + R3 < H(Y) + I{Z 1 ;Z 2 I U) - (5(e). (28) 

Proof sketch: The events of which £& is composed are illustrated in Figure 9, which also 
depicts the structure of the codebook for mo = 1. The product sets £i(mi) x C 2 (m 2 ), for each 
(mi,m 2 ), are represented by shaded squares. In each product set, the sequence pair selected 
in step 5 of the codebook generation procedure is shown with its superposed x n codewords, 
as created in step 4. The correct codeword x n (l, 1, 1, 1) is shown as a white circle which is 
connected to the received sequence y n . The codewords that may be mistakenly detected at the 
receiver are shown as black circles. The product sets associated with decoding error events £\, 
£2, £3, and £4 are labeled 1, 2, 3, and 4, respectively. 

We bound the probability of each sub-event of £&. First, note that by the conditional typicality 
lemma in [8], P(£q) — > as n — > 00 (this relies on e' < e). The probabilities of the events 
£1 through £5 conditioned on £ e c tend to zero as n — > 00 under conditions (24) through (28), 
correspondingly. 

The events £ 2 and £3 require the most careful analysis, since the true codeword x n (l, 1, 1, 1) 
and the codewords with which it may be confused can share the same z" or z 2 sequence (see 
dashed line and circles on it in Figure 9). Moreover, even when the chosen pairs in two different 
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*?(M) 

*P(1,2) 

J n( 1)2 n(«l-«l)) 



z 2 ™(i,4 M) ) 




3:"(1, 1,1,1) 



<p7 <p7 



m% = 1 m2 = 2 TO2 = 3 
Figure 9. Illustration of decoding error events, for mo = 1, 



product sets do not share one of the two coordinates (see the chosen pairs for (mi , 7712) = (1, 1) 
and (2, 1) in Figure 9), correlation could potentially be caused by the selection procedure in 
step 5 of codebook generation. We use the independence lemma (Lemma 2) to show that the 
event £ e c prevents this correlation leakage from occurring. The application of the lemma is what 
distinguishes this analysis from the conventional Marton inner bound for broadcast channels [14, 
15]. There, analysis of the selection process can be altogether avoided since each receiver 
decodes only one of the two coordinates. 

A detailed proof for the event £3 is given in Appendix C, the other events follow likewise. ■ 

Analysis of disturbance rate. When viewed by receiver Zy, the codeword for message m = 
(m Q ,mi,m 2 ,m 3 ) appears as 2™(m , /( m °> mi ' m2 ))_ We can pessimistically assume that all 
sequences z™(mo, ly) as created in step 2 of codebook generation can be seen at the receiver for 
some message m. Therefore, the number of possible sequences at Zy, and thus its disturbance 
rate, is upper-bounded by H(Z{ 1 ) < n(Ro + Applying the same argument for Z2, the 
proposed scheme achieves 

R0 + R1 <Rd,i, (29) 
R0 + R2 < Ri ti . (30) 

Conclusion of the proof. Collecting inequalities (21) through (30), recalling R = Rq + Ri + 
i? 2 + i?3, and using the Fourier-Motzkin procedure to eliminate R , Ry, R 2 , and i? 3 leads to 
the (R, i?d,i, Rd,2) region claimed in the theorem. 
Finally, the statement of Remark 6 follows from 

-I(Z 1 ;Z 2 \U) + I(Z 1 ;Z 2 \U,Y) 

= -H{Z 2 I U) + H{Z 2 \U,Zi) + H{Z 2 I U,Y) - H(Z 2 | U, Y, Z x ) 
= -I(Y;Z 2 \U) + I(Y:Z 2 \U,Z 1 ), 

which leads to the equality 

H(Y I Zt,Z 2 , U) + H(Y I U) - I(Z i; Z 2 \ U) + J(Z X ; Z 2 \ U,Y) 
= H(Y I Z x , Z 2 , U) + H(Y I U) - I(Y; Z 2 \U) + I(Y; Z 2 | U, Z x ) 
= H(Y\Z 1 ,U) + H(Y\Z 2 ,U). 
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B. Proof of Theorem 4 
First, consider 



nR < I(X n ;Y n )+ne n 

n 

= Y / HX n ;Y l \Y*- 1 )+ns n 



= nI(X-Y\Q) 
= nH(Y\Q). 



ne r< 



Furthermore, 



nRi,i > I{X n - Z?) 
> I(Y n ;Z?) 

n 

= Y / I(Y*-:Z?\Y*- 1 ) 

n 

»=i 

= nI(Y;Z 1 \Q), 

where Y = Y T , Z x = Z 1T , and Q = (Y T ~ 1 ,T) with T - Unif [l:n]. The same argument leads 
to 

nR& a > nI{Y;Z 2 \Q), 

with the same random variable identifications, and the additional Z2 = ^2T- Finally, the 
cardinality bound on Q follows from the convex cover method in [8]. 

C. Proof of Theorem 5 
First, we specialize Theorem 3 as follows. 

Corollary 4. The rate-disturbance region 8% of the deterministic channel with two disturbance 
constraints is inner-bounded by the set of rate triples (R, Ra,i, ^,2) such that 

R<H(Y), (31) 
Ri,i>I(Y;Z u U), (32) 
R d ,2>I(Y;Z 2 ,U), (33) 
i?d,i + Rd.2 > I(Y; Z 1 ,Z 2 ,U)+ I(Y; U) + I(Z 1 ;Z 2 | U) 

= I(Y; Z U U) + I(Y; Z 2 ,U) + I(Z 1 ;Z 2 | U, Y), (34) 

for some pmf p(u, x). 

The two equivalent expressions in (34) originate from Remark 6. An example of the constituent 
regions of Corollary 4 for fixed p(u, x) is depicted in Figure 10. The figure also illustrates how 
the corollary follows from Theorem 3: Each constituent region of the corollary is a strict subset 
of the constituent region of the theorem, for the same p(u,x). 
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t L 

X(32) 









(33) 




Rd ,: 



Figure 10. Constituent region for Corollary 4, for a fixed p(u,x). Each face is annotated by the 
inequality that defines it. For comparison, the constituent region of Theorem 3 is shown with dashed lines 
(see Figure 5). 



Proof of Corollary 4: In Theorem 3, consider the case where (1) is met with equality, i.e., 
R = H(Y). This yields a subset region which is still achievable. It simplifies to 

R d ,i+R d , 2 >I(Z 1 ;Z 2 \U), (35) 

Ri,x>I{Y;Zx,U), (36) 

R±2>I(Y;Z 2 ,U), (37) 

i?d,i + R±2 > I(Y; Z u Z 2 , U) + I(Z 1 ;Z 2 1 17), (38) 
Rd,i + Rd.2 > I(Y; Z U Z 2 ,U) + I(Y; U) + I(Z i; Z 2 \ U) 

= I(Y; Z U U) + I(Y; Z 2 ,U)+ I(Z 1 ;Z 2 \U,Y). (39) 

Clearly, conditions (35) and (38) are dominated by inequality (39), and the desired result follows. 

■ 

Proof of achievability for Theorem 5: We further specialize Corollary 4. We choose 
U = Z\ V Z 2 , i.e., the common part of Z\ and Z 2 . This implies that condition (34) can be 
omitted, since I(Z±; Z 2 | U, Y) — for all p(u,x) by assumption. Furthermore, U can be 
dropped from conditions (32) and (33) by virtue of being a function of Z\ and Z 2 . We conclude 
that 

R < H(Y), (40) 
Rd,i>I(Y;Z 1 ), (41) 
Rd,2>I(Y;Z 2 ), (42) 

is achievable for all p(x). Adding a time-sharing random variable Q completes the proof. 

Note that in the special case where Y =4 Z\ or Y =^ Z 2 , the same conclusion holds with the 
choice [7 = 0. ■ 

D. Proof of Corollary 3 

Proof of achievability: We prove the result for Z\ =<; Z 2 , the other case follows by symmetry. 
We specialize the achievable region of Theorem 3 by choosing U = Z 2 . The rate-disturbance 
constraints are 

R < H(Y), (43) 
Rds + Rd,2 > 0, (44) 
R-R i ,x<H{Y\Z 1 ), (45) 
R-Ra,2<H(Y\Z 2 ), (46) 
R-Rd,i-Rd,2<H{Y\Z 1 ), (47) 
2R - i? d4 - Rao < H(Y \ Z x ) + H(Y \ Z 2 ). (48) 
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Clearly, (44) is vacuous. Furthermore, (47) is dominated by (45), and (48) is dominated by the 
sum of (45) and (46). This completes the proof. ■ 
Proof of converse: The first inequality follows from Fano's inequality as 

nR<I(X n ;Y n ) + ns n 
= H(Y n ) + ne n 
< nH(Y)+ne n , 

where Y = Yq and Q ~ Unif[l:n]. The other two inequalities follow as 

n(R - i? d ,i) < nR - I(X n ; Z?) 

<H(Y n )-H(Z?)+ne n 
<H{Y n ,Z?)-H{Z?)+ne n 
= H(Y n | Z?) + ne n 
<nH{Y\Z 1 )+ne n , 

with Z\ = Z\q, and likewise for n(R — Ra^)- ■ 
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Appendix 

A. Proof of Lemma 1 

The product bin (7711,7712) = (1, 1) for too = 1 contains Im sequence pairs, where I = 2 nri 
and m = 2 nr \ Each pair (Z?(l,h), Z$(l,l 2 )), for h € [1:1] and l 2 € [l:m], has probability 
p = 2~ nI ( Zl '' Z2 1 c/ ) to be jointly typical. Now fix one coordinate, say l\ = 1. The corresponding 
"row" of the bin contains m sequences Z%(1, l 2 ), each of which has an independent probability 
of p to be jointly typical with Z™(1, 1). Let if be the total number of typical sequences in this 
row. Then 

P(K = 0) = (l-p)"\ 

P(K = 1) = mp(l -p) m ~\ 

P(Jf > 2) = 1 - (1 - p + mp) (1 - p)" 1 - 1 

>l-(ro-l)p 

, 2 2 
< 777 p . 

We have thus upper-bounded the probability to encounter two or more typical pairs in a single 
row. Consequently, the probability of two or more typical pairs occurring in any row is upper 
bounded by lm 2 p 2 . Substituting definitions leads to the desired inequality. The same argument 
can be made for columns of the bin. 



B. Proof of independence lemma ( Lemma 2 ) 

We prove the lemma for s = 1, the remaining cases follow by symmetry. For ease of notation, 
define the specialized modulo operator [a;] = 1 + {{x — 1) mod n), the indicator function 
1.4' ( a ) = 1 if a G A' and otherwise, and the shorthand notations Y = Aj and Z = Aj. 
Notice that 



p(a n ) 



c U?=iPA( a i) if a k € A' for some k <G [l:n] 
otherwise, 



where c is a normalization constant, the exact value of which is not relevant. Further, 

} 7- x if a, e A' 



p(i\a n ) 



otherwise. 



The joint distribution of {A n ,I, J, Y, Z) is then 

10 otherwise. 
Partially marginalizing, it follows that 

p(a") 



»=1 a": aiG^t' 

0[ 4+ i]=Z 



It is clear that p(y, z) = p(y)p(z) = if y £ A'. On the other hand, for y e A', we have 

U?=iPA{ai) 



«— 1 a : a; — u - 1 



a : ai—y 
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The fraction under the sum is invariant under permutations of a n . Therefore, 



1 ™ 

*> Z ) = :E E 



c £t ELi i.A'K) 

«=1 a : a\=y ^ *■ 
a 2 =z 

np A {y)PA{z) nr=3^( a 



E 



where a 3 are the last n — 2 components of a". Observe that z) separates into a function 
of z and a function of y. Independence is thus established. 

C. Proof of Lemma 3, exemplified for £ 3 
We analyze the probability of £3 as follows. 

f 3 = {(^(i)^r(i,4 1 ' 1 ' ro2) )^2(i,4 M ' m2) ) ! 

X n (i, L (w«) ]L (u, m2 ) i m3h Y n ) e r £ (n) , 

for some m2 7^ 1, 777,3} 

c{(cp(i) ) ^(i,4 1 - 1 > ma >),^(i,i 2 ) ) 

x"(i,4 ia ' m2) ,/ 2 , 777 3 ),r") eT £ w , 

for some m 2 7^ 1, m 3 , l 2 ^ £ 2 (1)}, 

Define the event £ eq = {L< 1,1,ma) = 4 M,1) }, which allows us to write P(£ 3 | £ e c ) = P(£ 3 n 
^•eq I ) + P(^3 ^ ^eq I &e )• We consider both terms separately. 

£3 n £ eq c { (cr*(i), zfti, 4 M,1) ), z 2 "(i, i 2 ), 
r(i,4 1 ' 1 ' 1) ,i 2 ,m 3 ),r)erf» ) , 

for some l 2 £ 2 (1), 7773}. 



Thus, 



P(£ 3 n£eq|£ e c ) 

< 2 p (tp(i) - u", zra, l& m) ) - *?, r - 2/" 1 £t) 



2 n "3 



■ J2 E p((u n ,z?,zz(i,i 2 ), 

l2 tC 2 (l) m,=l x „ (1; L (1,M) ) i2>m3)) y n) g -^(n) | £c) 

^ 2 n (- R2 +- Ra ) p* 

where P* is shorthand for the last P( ) expression. Continue with 

P* =J2 P(Z2(1,1 2 ) = zS,X n {l,L^ 1 ' 1 \l 2 ,m 3 )=x n 

J2 P(4W l ) p(x n \z? ■ z?,u n ) 

(z2,X n )eT} n '( ± 2 -nH{Z 2 \U) ^ 2 -nH{X\Z l ,Z 2 ,U) 

Z 2 ,X \u" ,z™ ,y n ) 



^ 2 nH(X,Z 2 \Z 1 ,Y,U) 
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< 2n(H(X,Z 2 \Z 1 ,Y,U)-H(Z 2 \U)-H(X\Z 1 ,Z 2 ,U)+5(e)) 
_ 2n(-H(Y\Z 1 ,U)-I(Z 1 ;Z 2 \U)+6(e)) 

In step (a), we have used the fact that l 2 $ £2(1), and therefore, Z 2 (l, l 2 ) relates to a bin other 
than the first one. It is independent of the conditions Y n = y n and £ e c , both of which relate 
only to the (1,1) bin for m = 1. A similar argument applies to the second term. 

Substituting back in the previous chain of inequalities implies that P(£ 3 H £ eq \ £g) — > as 
n — > 00 if inequality (26) holds. 

Next, consider 

£3 n s: q C { (C/"(l), Z™(1, ii), Z 2 "(l, Z 2 ), X™(1, Z 1; l 2 , m 3 ), 
y n ) G T £ (n) , for some Zi e £i(l) \ {4 1,M) }, 
Z 2 ^ £ 2 (1), m 3 }. 

We argue 

p(£ 3 n£ e c q |£ e c ) 

(««,y)er E (n) iie£i(i)\{iS M,1) } 

i 2i c 2 (i)m 3 =i X n (l,l 1 ,l 2 ,m 3 ),y n ) £ T e {n) \U n {l) = u n ,Y n =y n ,£c) 
<- 2«(-Ri--Ri+fl2+fl3) p* 

where P* represents the last P(-) expression. Finally, 

P* = 2 P(Z 1 "(l,/ 1 ) = ^,Z 2 "(l,/ 2 ) = ^, 
, )6 „^l ( ^"(1- ii. fe, m 3 ) = *" I 

Zi,Z 2 ,A |u ,y ) 

U n (l)=u n ,Y n = y n ,£?) 

: J2 E P(2?M) = for 

(*r,«J,x»)er,<")( for all Z 2 e £ 2 (1) I £ e c ) 

Z 1 ,Z 2 ,X|« n ,y n ) alU^e£ 2 (l) 

■ p(^(l,/l) = ^,^(l,/ 2 ) = ^, 

A- n (l,/i,i 2 ,m 3 ) =x n \ 

U n (l) = u n ,Y n = y n ,Z^(l,l' 2 ) = z^(l' 2 ) 

for ail z 2 e £2(1), £ e °) 

< £ P(4K) p(x"|^ ; z 2 ",n") 

(z^,2j,X n )eT e (n) ( ( b ) i2 -"- ff ( Z 2l^) =2-" H < X l Z l' Z 2.t/) 

Z lt Z 2 ,X I tt",jy")^ ^ 2 _ nJ T (Zl |t7) 

^ 2nH(x,z 1 ,z 2 \Y,U) 
<- 2n(ff(^,-Zi,Z 2 |r,C/)-J?(Zi|r/)-fJ'(Z 2 |C/)-J?(X|Zi,Z 2 , E/)+<5(e)) 

_ 2^(-H(Y\U)-I(Z 1 :Z 2 \U)+8(e)) 

Here, (a) uses uses the fact that for the l\ indices in question, Z™(l,Zi) is independent of 
Y n . This is a consequence of independence between the selected Z™ (1, L^ 1 ' 1 ' 1 ^) and the other 
(non-selected) Z™(l,Zi) due to Lemma 2. The lemma applies because the event is conditioned 
(1) on £ e c , which ensures that picking L^ 1 ' 1 ' 1 -' is uniform as required by the lemma, and (2) on 
Z 2 (l,l' 2 ) for all 1' 2 £ £2(1), which provides for the qualifying set A' of the lemma. 
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Step (b) follows from 

v {z n x | u n , s c e ) = P (z? | u n ) • p(g ;J""'f } 

eJ v 1 1 ; p(£ e c |w") 

< p(«l I «") • "TFn v 

<p(*?l« B )-i^=*r 

< 2-n(H(Zi|C/)-e) . 2™<5' 

< 2 -n(iJ(Zi|I7)-e-5') 

Here, (5 is the minimum slack of the three conditions for £g in Lemma 1. Note that for any 
5, 5' > 0, we can find an N such that 

Vn > N : \r-r- < T s ' . 

- I _ 2-dn - 

We conclude that P(£ 3 n £ e c q | £ e c ) -> as n -> oo if 

i?! - i?i + i? 2 + i? 3 < #(F|Q) + /(X i; X 2 |Q) - (5(e). 
This is an implication of (27) which stems from analyzing £4, and may thus be omitted. 



